Non-equilibrium Surface Growth and 
Scalability of Parallel Algorithms for Large 
Asynchronous Systems 



G. Korniss 1 , M. A. Novotny 1 , Z. Toroczkai 2 , and P. A. Rikvold 1 ' 3 

1 School of Computational Science and Information Technology, 
Florida State University, Tallahassee, Florida 32306-4120, USA 

2 Department of Physics, University of Maryland, 
College Park, MD 20742-4111, USA 

3 Department of Physics and Center for Materials Research and Technology, 
Florida State University, Tallahassee, Florida 32306-4350, USA 

Abstract. The scalability of massively parallel algorithms is a fundamental ques- 
tion in computer science. We study the scalability and the efficiency of a conserva- 
tive massively parallel algorithm for discrete-event simulations where the discrete 
events are Poisson arrivals. The parallel algorithm is applicable to a wide range of 
problems, including dynamic Monte Carlo simulations for large asynchronous sys- 
tems with short-range interactions. The evolution of the simulated time horizon is 
analogous to a growing and fluctuating surface, and the efficiency of the algorithm 
corresponds to the density of local minima of this surface. In one dimension we find 
that the steady state of the macroscopic landscape is governed by the Edwards- 
Wilkinson Hamiltonian, which implies that the algorithm is scalable. Preliminary 
results for higher-dimensional logical topologies are discussed. 

1 Introduction 

Dynamic Monte Carlo (MC) simulations are invaluable tools for investigating 
the evolution of complex systems. For a wide range of systems it is plausi- 
ble to assume (and in rare cases it is possible to derive) that attempts to 
update the state of the system form a Poisson process. The basic notion is 
that time is continuous, and the discrete events (update attempts) occur in- 
stantaneously. The state of the system remains constant between events. It 
is worthwhile to note that the standard random-sequential update schemes 
(easily implementable on serial computers) produce this dynamics for "free:" 
the waiting-time distribution for the attempts to update each subsystem or 
component is geometrical and approaches the exponential distribution in the 
large-system limit. This uniquely characterizes the Poisson process. 

The parallel implementation of these dynamic MC algorithms belongs 
to the class of parallel discrete-event simulation, which is one of the most 
challenging areas in parallel computing Q . The numerous applications range 
from the natural sciences and engineering to computer science and queueing 
networks. For example, in lattice Ising models the discrete events are spin-flip 
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attempts, while in queueing systems they are job arrivals. The difficulty of 
parallel discrete-event simulations is that update attempts are not synchro- 
nized by a global clock. In fact, the traditional dynamic MC algorithms were 
long believed to be inherently serial, i.e., in spin language, the correspond- 
ing algorithm was thought to be able to update only one spin at a time. 
However, Lubachevsky presented an approach for parallel simulation of these 
systems [0 without changing the underlying Poisson process. Applications 
include modeling of cellular communication networks Q , particle deposition 
[f| , and metastability and hysteresis in kinetic Ising models || . 

In a distributed massively parallel scheme each processing element (PE) 
carries a subsystem of the full system. The parallel algorithm must concur- 
rently advance the Poisson streams corresponding to each subsystem without 
violating causality. This requires the concept of local simulated time, as well as 
a synchronization scheme. Intuitively it is clear that systems with short-range 
interactions contain a "substantial" amount of parallelism. For the "conser- 
vative" approach [|| , the efficiency of the algorithm is simply the fraction of 
PEs that are guaranteed to attempt the update without breaking causality. 
The rest of the PEs must idle. 

2 Time Horizon Evolution and Efficiency Modeling 

We consider a d-dimensional hypercubic regular lattice topology where the 
underlying physical system has only nearest-neighbor (nn) interactions (e.g., 
Glauber spin-flip dynamics) and periodic boundary conditions. The scalabil- 
ity analysis is made for the "worst-case" scenario in which each PE hosts a 
single site (e.g., one spin) of the underlying system. While this may be the 
only scenario for a special-purpose computer with extremely limited local 
memory, on architectures with relatively large memory one PE can host a 
block of sites. This substantially increases the efficiency, bringing it to the 
level of practical applicability M. 

In the basic parallel scheme [|2j , each PE generates its own local simulated 
time for the next update attempt. The set of local times {n(t)}f =1 constitute 
the simulated time horizon. Here, L is the linear size of the lattice {L d is the 
number of PEs), and t is the index of the simultaneously performed parallel 
steps. Initially, Ti(0)=0 for every site. At each parallel time step, only those 
PEs for which the local simulated time is not greater than the local simulated 
times of their nn can attempt the update and increment their local time by an 
exponentially distributed random amount, r)i(t). Without loss of generality 
we take (?7i(t))=l. The other PEs idle. Due to the continuous nature of the 
random simulated times, for t>0 the probability of equal-time updates for any 
two sites is of measure zero. The comparison of the nn simulated times and 
idling if necessary enforces causality. Since at worst the PE with the absolute 
minimum simulated time makes progress, the algorithm is free from dead- 
lock. For this basic conservative scheme, the theoretical efficiency (ignoring 
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Fig. 1. Evolution of the simulated time horizon in one dimension: (a) surface width; 
(b) density of local minima; (c) Steady-state scaling of the parallel efficiency (den- 
sity of local minima). Note the small vertical scales in (b) and (c) 

communication overheads) is simply the fraction of non-idling PEs. This cor- 
responds to the density of local minima of the simulated time horizon. Note 
that the evolution of the simulated time horizon is completely independent of 
the underlying model (except for its topology) and can be written as: 

T i (t + i)=n(t)+ [] efoW-TiWMt). (i) 

Here Df n is the set of nearest neighbors of site i, and ©(■) is the Heaviside step 
function. The evolution of the simulated time horizon is clearly analogous to 
an irreversibly growing and fluctuating surface. 

There are two important quantities to study. The first is the density of 
local minima, {u(t))L, in particular its asymptotic (or steady-state) value and 
finite-size effects. It corresponds directly to the efficiency of the algorithm. 
The second is the surface width, (w 2 (t)) = (l/L d )(J2f = i bi(t) - ~r(t)] 2 ), where 

L d 

f{t) = {l/L d )Y J ^ =1 n{t). It describes the macroscopic roughness of the time 
horizon and has important consequences for actual implementations J(| (e.g., 
optimal buffer size for a collecting statistics network H). 

For d—1 we showed pj by coarse-graining and direct simulation of (^) 
that the evolution of the simulated time horizon belongs to the KPZ uni- 
versality class H . Our simulation confirmed that before reaching the steady 
state, (u> 2 (i))~P /3 with /3~l/3 [Fig. 1(a)]. At the same time the density of 
local minima, (u(t))i,, decreases monotonically with time towards a long-time 
asymptotic limit well separated from zero [Fig. 1(b)]. The steady state is gov- 
erned by the Edwards- Wilkinson Hamiltonian H, and the stationary width 
scales as (w 2 )^L 2a , where a=l/2 is the roughness exponent. This guaran- 
tees that the coarse-grained landscape is a simple random-walk surface; the 
local slopes are short-range correlated. Thus, the density of local minima is 
non-zero. The non-zero density of local minima is a universal characteristic of 
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Fig. 2. Evolution of the simulated time horizon in d=2 and d=3: (a) surface width 
in d=2; (b) density of local minima in d=2; (c) surface width in d=3; (d) density 
of local minima in d—3. 



this class J7|,[10|. Further, its steady- state finite-size effects can be written as 
( u )l — (u)oo + const/L. The 0(1/1/) correction in d=l appears to be rather 
robust for periodic boundary conditions . The extrapolated value for the 
efficiency is (u) oo =0.2464(l) [Fig. 1(c)]. 

In higher dimensions we observe the same qualitative behavior as for 
d=l. The surface roughens and saturates for any finite system, as seen in 
Fig. 2(a,c). Simultaneously, the density of local minima decreases monotoni- 
cally towards its asymptotic (t— k») finite-size value [Fig. 2(b,d)]. Again, the 
steady-state density of local minima appears to be well separated from zero. 
For d=2 (u) oo «0.12, and for d=3 (u) oo «0.075. The (^^O^/K) behav- 
ior appears to be rather general Q, where K=2d is the number of nearest 
neighbors on a regular lattice. 

Similar to the d=l case, corrections to scaling are very strong, both for the 
surface width and the density of local minima. While for d—1 we were able to 
simulate large systems (L^IO 3 ) to obtain the KPZ scaling exponents and the 
steady-state finite-size behavior of (u) l , in higher dimensions the relatively 
small system sizes prevented us from extracting the scaling behavior of the 
width and the finite-size effects of the density of local minima. 

We conjecture that the simulated time horizon exhibits KPZ-like evolution 
in higher dimensions as well. While the scalability of the parallel scheme for 
random topology has been investigated earlier |J , the underlying mechanism 
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for regular lattices (macroscopic roughening) was only recently pointed out 
. The major implication is that while the algorithm is scalable, and the 
width of the simulated time horizon saturates for any finite number of PEs, 
it diverges in the limit of an infinite number of PEs. Thus, in an actual 
implementation the programmer has to take some actions to handle statistics 
collection. 



3 Summary and Outlook 

The analogy between the evolution of the simulated time horizon and non- 
equilibrium surface growth illustrates that when devising parallel algorithms, 
the programmer has to be very much concerned with the morphological prop- 
erties of the associated surface. To fully describe the evolution of the simu- 
lated time horizon and the efficiency of the algorithm, one must focus on both 
the long wave-length behavior (the macroscopic width) and short-distance 
properties (the density of local minima). While most previous surface studies 
focus on the long wave-length properties, physics at short distances is just as 
challenging, has real applications, and may reveal universal features J? 10 . 
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